Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

نویسندگان

  • Salha Alzahrani
  • Vasile Palade
  • Naomie Salim
  • Ajith Abraham
چکیده

In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific publications in both problems, and the consideration of citation evidence in the second problem. Three statistical measures namely Inverse Generic Class Frequency, Spread, and Depth are introduced to assign a degree of importance (i.e., weight) to structural components in scientific articles. A term-weighting scheme is adjusted to incorporate component-weight factors, which is used to improve the retrieval of potential sources of plagiarism. A plagiarism screening process is applied based on a measure of resemblance, in which component-weight factors are exploited to ignore less or nonsignificant plagiarism cases. Using the notion of citation evidence, parts with proper citation evidence are excluded, and remaining cases are suspected and used to calculate the similarity index. The authors compare their approach to two flat-based baselines, TF-IDF weighting with a Cosine coefficient, and shingling with a Jaccard coefficient. In both baselines, they use different comparison units with overlapping measures for plagiarism screening. They conducted extensive experiments using a dataset of 15,412 documents divided into 8,657 source publications and 6,755 suspicious queries,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An introduction to the examples of scientific plagiarism and its identification soft-wares

Background: Increasing Immorality and Plagiarism in the country's higher education system has become a serious crisis. Hence, the purpose of this study was to analyze the Examples of Plagiarism and the introduction of Plagiarism detection software. Method: The present study is a narrative review study. Articles in Persian and Latin related to the use of scientific theft key words in databases w...

متن کامل

Citation Analysis of the Most Influential Publications in Travel Medicine

Introduction: Citation analysis reflects the extent to which published work has been recognized in the scientific community. The purpose of this study was to characterize the most cited publications in travel medicine.Methods: Travel medicine articles indexed on Scopus which had been published in the English language through 2016 were retrieved independen...

متن کامل

مروری بر تحلیل استنادی و گزارش استنادی مجله‌ها و کاربرد آن در انتخاب نشریات لاتین

Nowadays, English publications are considered as one of the significant and essential resources in university libraries. Enhancement of the price of publications along with the increase in number of published journals has made it difficult for libraries to provide all the information needed by researchers. Therefore, the necessity of a criterion for selecting superior journals is increasingly f...

متن کامل

Plagiarism in scientific research and publications and how to prevent it

Quality is assessed on the basis of adequate evidence, while best results of the research are accomplished through scientific knowledge. Information  contained in a scientific work must always be based on scientific evidence. Guidelines for genuine scientific research should be designed based on real  results. Dynamic research and use correct methods of scientific work must originate from eve...

متن کامل

Counting Co-occurrences in Citations to Identify Plagiarised Text Fragments

Research in external plagiarism detection is mainly concerned with the comparison of the textual contents of a suspicious document against the contents of a collection of original documents. More recently, methods that try to detect plagiarism based on citation patterns have been proposed. These methods are particularly useful for detecting plagiarism in scientific publications. In this work, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 63  شماره 

صفحات  -

تاریخ انتشار 2012